Parallel Inductive Logic in Data Mining
نویسنده
چکیده
Data-mining is the process of automatic extraction of novel, useful and understandable patterns from very large databases. High-performance, scalable, and parallel computing algorithms are crucial in data mining as datasets grow inexorably in size and complexity. Inductive logic is a research area in the intersection of machine learning and logic programming, which has been recently applied to data mining. Inductive logic studies learning from examples, within the framework provided by clausal logic. It provides a uniform and very expressive means of representation: All examples, background knowledge as well as the induced theory are expressed in rst-order logic. However, such an expressive representation is often computationally expensive. This report rst presents the background for parallel data mining, the BSP model, and inductive logic programming. Based on the study, this report gives an approach to parallel inductive logic in data mining that solves the potential performance problem. Both parallel algorithm and cost analysis are provided. This approach is applied to a number of problems and it shows a super-linear speedup. To justify this analysis, I implemented a parallel version of a core ILP system { Progol { in C with the support of the BSP parallel model. Three test cases are provided and a double speedup phenomenon is observed on all these datasets and on two di erent parallel computers.
منابع مشابه
Parallel Inductive Logic for Data Mining
Data mining is the process of automatic extraction of novel, useful and understandable patterns in very large databases. High-performance, scalable, and parallel computing algorithms are crucial in data mining as datasets grow in size and complexity. Inductive logic is a research area in the intersection of machine learning and logic programming, which has been recently applied to data mining. ...
متن کاملInductive Logic Programming for Bioinformatics in Prova
This paper describes the inductive logic programming (ILP) features of Prova, a state-of-art distributed Semantic Web and Life Science inference service system and architecture for multi-relational data mining of complex Life Science phenomena such as complex biological relationships. The proposed novel design artifact implements typical ILP inference formalisms for rule-based generalization an...
متن کاملDistributed Generative Data Mining
A process of Knowledge Discovery in Databases (KDD) involving large amounts of data requires a considerable amount of computational power. The process may be done on a dedicated and expensive machinery or, for some tasks, one can use distributed computing techniques on a network of affordable machines. In either approach it is usual the user to specify the workflow of the sub-tasks composing th...
متن کاملIndLog - Induction in Logic
IndLog is a general purpose Prolog-based Inductive Logic Programming (ILP) system. It is theoretically based on the Mode Directed Inverse Entailment and has several distinguishing features that makes it adequate for a wide range of applications. To search efficiently through large hypothesis spaces, IndLog uses original features like lazy evaluation of examples and Language Level Search. IndLog...
متن کاملAn Inductive Logic Programming Query Language for Database Mining
First, a short introduction to inductive logic programming and machine learning is presented and then an inductive database mining query language RDM (Relational Database Mining language). RDM integrates concepts from inductive logic programming, constraint logic programming, deductive databases and meta-programming into a flexible environment for relational knowledge discovery in databases. Th...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2000